347 research outputs found

    Sampling through time and phylodynamic inference with coalescent and birth-death models

    Full text link
    Many population genetic models have been developed for the purpose of inferring population size and growth rates from random samples of genetic data. We examine two popular approaches to this problem, the coalescent and the birth-death-sampling model, in the context of estimating population size and birth rates in a population growing exponentially according to the birth-death branching process. For sequences sampled at a single time, we found the coalescent and the birth-death-sampling model gave virtually indistinguishable results in terms of the growth rates and fraction of the population sampled, even when sampling from a small population. For sequences sampled at multiple time points, we find that the birth-death model estimators are subject to large bias if the sampling process is misspecified. Since birth-death-sampling models incorporate a model of the sampling process, we show how much of the statistical power of birth-death-sampling models arises from the sequence of sample times and not from the genealogical tree. This motivates the development of a new coalescent estimator, which is augmented with a model of the known sampling process and is potentially more precise than the coalescent that does not use sample time information.Comment: Submitte

    Understanding drivers of phylogenetic clustering in molecular epidemiological studies of HIV.

    Get PDF
    This work was supported by the Medical Research Council (MR/J013862/1 to S.D.W.F.), the Economic and Social Research Council (ES/K003585/1 to S.D.W.F.), the Wellcome Trust (097410/Z/11/Z to D. P.), the Bill and Melinda Gates Foundation (OPP1084362 to D.P.), and the National Institute for Health Research (to D.P.).This is the final version of the article. It first appeared from Oxford University Press via http://dx.doi.org/10.1093/infdis/jiu56

    Measuring Asymmetry in Time-Stamped Phylogenies.

    Get PDF
    Previous work has shown that asymmetry in viral phylogenies may be indicative of heterogeneity in transmission, for example due to acute HIV infection or the presence of 'core groups' with higher contact rates. Hence, evidence of asymmetry may provide clues to underlying population structure, even when direct information on, for example, stage of infection or contact rates, are missing. However, current tests of phylogenetic asymmetry (a) suffer from false positives when the tips of the phylogeny are sampled at different times and (b) only test for global asymmetry, and hence suffer from false negatives when asymmetry is localised to part of a phylogeny. We present a simple permutation-based approach for testing for asymmetry in a phylogeny, where we compare the observed phylogeny with random phylogenies with the same sampling and coalescence times, to reduce the false positive rate. We also demonstrate how profiles of measures of asymmetry calculated over a range of evolutionary times in the phylogeny can be used to identify local asymmetry. In combination with different metrics of asymmetry, this combined approach offers detailed insights of how phylogenies reconstructed from real viral datasets may deviate from the simplistic assumptions of commonly used coalescent and birth-death process models.This work was supported by a Medical Research Council Methodology Research Programme grant to S.D.W.F (grant number MR/J013862/1).This is the final version of the article. It first appeared from PLoS via http://dx.doi.org/10.1371/journal.pcbi.100431

    Biased phylodynamic inferences from analysing clusters of viral sequences.

    Get PDF
    Phylogenetic methods are being increasingly used to help understand the transmission dynamics of measurably evolving viruses, including HIV. Clusters of highly similar sequences are often observed, which appear to follow a 'power law' behaviour, with a small number of very large clusters. These clusters may help to identify subpopulations in an epidemic, and inform where intervention strategies should be implemented. However, clustering of samples does not necessarily imply the presence of a subpopulation with high transmission rates, as groups of closely related viruses can also occur due to non-epidemiological effects such as over-sampling. It is important to ensure that observed phylogenetic clustering reflects true heterogeneity in the transmitting population, and is not being driven by non-epidemiological effects. We qualify the effect of using a falsely identified 'transmission cluster' of sequences to estimate phylodynamic parameters including the effective population size and exponential growth rate under several demographic scenarios. Our simulation studies show that taking the maximum size cluster to re-estimate parameters from trees simulated under a randomly mixing, constant population size coalescent process systematically underestimates the overall effective population size. In addition, the transmission cluster wrongly resembles an exponential or logistic growth model 99% of the time. We also illustrate the consequences of false clusters in exponentially growing coalescent and birth-death trees, where again, the growth rate is skewed upwards. This has clear implications for identifying clusters in large viral databases, where a false cluster could result in wasted intervention resources

    Assigning and visualizing germline genes in antibody repertoires.

    Get PDF
    Identifying the germline genes involved in immunoglobulin rearrangements is an essential first step in the analysis of antibody repertoires. Based on our prior work in analysing diverse recombinant viruses, we present IgSCUEAL (Immunoglobulin Subtype Classification Using Evolutionary ALgorithms), a phylogenetic approach to assign V and J regions of immunoglobulin sequences to their corresponding germline alleles, with D regions assigned using a simple pairwise alignment algorithm. We also develop an interactive web application for viewing the results, allowing the user to explore the frequency distribution of sequence assignments and CDR3 region length statistics, which is useful for summarizing repertoires, as well as a detailed viewer of rearrangements and region alignments for individual query sequences. We demonstrate the accuracy and utility of our method compared with sequence similarity-based approaches and other non-phylogenetic model-based approaches, using both simulated data and a set of evaluation datasets of human immunoglobulin heavy chain sequences. IgSCUEAL demonstrates the highest accuracy of V and J assignment amongst existing approaches, even when the reassorted sequence is highly mutated, and can successfully cluster sequences on the basis of shared V/J germline alleles.S.K.L.P. and B.M. were supported in part by the U.S. National Institutes of Health (AI110181, AI90970, AI100665, DA34978, GM93939, HL108460, GM110749, LM7092, MH97520, MH83552), the UCSD Center for AIDS Research (Developmental Grant, AI36214, Bioinformatics and Information Technologies Core), the International AIDS Vaccine Initiative (through AI90970), the UC Laboratory Fees Research Program (grant no. 12-LR-236617). G.J.S. was supported in part the U.S. National Institute of Health (AI90118, AI68063, AI40305, and NIAID HHS N272201400019C), and a grant from the Lupus Research Institute. A.S.M.M.H. was supported by an Islamic Development Bank Scholarship, and S.D.W.F. was supported in part by the UK MRC Methodology Research Programme (grant no. MR/J013862/1).This is the final published version. It first appeared at http://rstb.royalsocietypublishing.org/content/370/1676/20140240

    Evaluation of the role of location and distance in recruitment in respondent-driven sampling.

    Get PDF
    BACKGROUND: Respondent-driven sampling(RDS) is an increasingly widely used variant of a link tracing design for recruiting hidden populations. The role of the spatial distribution of the target population has not been robustly examined for RDS. We examine patterns of recruitment by location, and how they may have biased an RDS study findings. METHODS: Total-population data were available on a range of characteristics on a population of 2402 male household-heads from an open cohort of 25 villages in rural Uganda. The locations of households were known a-priori. An RDS survey was carried out in this population, employing current RDS methods of sampling and statistical inference. RESULTS: There was little heterogeneity in the population by location. Data suggested more distant contacts were less likely to be reported, and therefore recruited, but if reported more distant contacts were as likely as closer contacts to be recruited. There was no evidence that closer proximity to a village meeting place was associated with probability of being recruited, however it was associated with a higher probability of recruiting a larger number of recruits. People living closer to an interview site were more likely to be recruited. CONCLUSIONS: Household location affected the overall probability of recruitment, and the probability of recruitment by a specific recruiter. Patterns of recruitment do not appear to have greatly biased estimates in this study. The observed patterns could result in bias in more geographically heterogeneous populations. Care is required in RDS studies when choosing the network size question and interview site location(s).RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Disease control across urban–rural gradients

    Get PDF
    Controlling the regional re-emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) after its initial spread in ever-changing personal contact networks and disease landscapes is a challenging task. In a landscape context, contact opportunities within and between populations are changing rapidly as lockdown measures are relaxed and a number of social activities re-activated. Using an individual-based metapopulation model, we explored the efficacy of different control strategies across an urban–rural gradient in Wales, UK. Our model shows that isolation of symptomatic cases or regional lockdowns in response to local outbreaks have limited efficacy unless the overall transmission rate is kept persistently low. Additional isolation of non-symptomatic infected individuals, who may be detected by effective test-and-trace strategies, is pivotal to reducing the overall epidemic size over a wider range of transmission scenarios. We define an ‘urban–rural gradient in epidemic size' as a correlation between regional epidemic size and connectivity within the region, with more highly connected urban populations experiencing relatively larger outbreaks. For interventions focused on regional lockdowns, the strength of such gradients in epidemic size increased with higher travel frequencies, indicating a reduced efficacy of the control measure in the urban regions under these conditions. When both non-symptomatic and symptomatic individuals are isolated or regional lockdown strategies are enforced, we further found the strongest urban–rural epidemic gradients at high transmission rates. This effect was reversed for strategies targeted at symptomatic individuals only. Our results emphasize the importance of test-and-trace strategies and maintaining low transmission rates for efficiently controlling SARS-CoV-2 spread, both at landscape scale and in urban areas

    Assessing Commitment and Reporting Fidelity to a Text Message-Based Participatory Surveillance in Rural Western Uganda.

    Get PDF
    Syndromic surveillance, the collection of symptom data from individuals prior to or in the absence of diagnosis, is used throughout the developed world to provide rapid indications of outbreaks and unusual patterns of disease. However, the low cost of syndromic surveillance also makes it highly attractive for the developing world. We present a case study of electronic participatory syndromic surveillance, using participant-mobile phones in a rural region of Western Uganda, which has a high infectious disease burden, and frequent local and regional outbreaks. Our platform uses text messages to encode a suite of symptoms, their associated durations, and household disease burden, and we explore the ability of participants to correctly encode their symptoms, with an average of 75.2% of symptom reports correctly formatted between the second and 11th reporting timeslots. Concomitantly we identify divisions between participants able to rapidly adjust to this unusually participatory style of data collection, and those few for whom the study proved more challenging. We then perform analyses of the resulting syndromic time series, examining the clustering of symptoms by time and household to identify patterns such as a tendency towards the within-household sharing of respiratory illness.National Institute of Health (Grant ID: TW009237)This is the final version of the article. It first appeared from the Public Library of Science via http://dx.doi.org/10.1371/journal.pone.015597
    corecore